Information extraction – Automatically extracting structured information from un- or semi-structured machine-readable documents, such as human language texts
tf–idf – (term frequency–inverse document frequency) a numerical statistic intended to reflect the importance of a word to a document in a collection or text corpuscles